Representative Pattern Extraction for Nucleotide Sequence Groups Using Base Frequency Differences
نویسندگان
چکیده
Advances in high-throughput technology in molecular biology have been producing lots of sequence data on various organisms. Some organisms like virus have various variances in their nucleotide sequences and could be categorized into several subtypes. A sequential pattern which characterizes a subtype and discriminates it from other subtypes is called signature. This paper proposes a method which extracts signature from a collection of sequences data. Based on position specific relative base frequency deference between one subtype data set and the other subtype data set, the proposed method examines discrimination capabilities for the potential signatures. A tool has been developed which implements the proposed method and applied to an experiment to extract signatures for HIV-1 virus subtypes.1
منابع مشابه
An Empirical Prior Improves Accuracy for Bayesian Estimation of Transcription Factor Binding Site Frequencies within Gene Promoters
A Bayesian method for sampling from the distribution of matches to a precompiled transcription factor binding site (TFBS) sequence pattern (conditioned on an observed nucleotide sequence and the sequence pattern) is described. The method takes a position frequency matrix as input for a set of representative binding sites for a transcription factor and two sets of noncoding, 5' regulatory sequen...
متن کاملAn Evolutionary and Phylogenetic Study of the BMP15 Gene
DNA sequence data contains a wealth of biologically useful information. Recent innovations in DNA sequencing technology have greatly increased our capacity to determine massive amounts of nucleotide sequences. These sequences can be used to specify the characteristics of different regions, interpret the evolutionary relationships between categorized groups, likelihood of performing multiple com...
متن کاملSingle Nucleotide Polymorphisms and Association Studies: A Few Critical Points
Uncovering DNA sequence variations that correlate with phenotypic changes, e.g., diseases, is the aim of sequence variation studies. Common types sequence variations are Single nucleotide polymorphism (SNP, pronounced snip).SNPs are the third-generation molecular marker. SNP represents a DNA sequence variant of a single base pair with the minor allele occurring in more than 1% of a given popula...
متن کاملآنالیز ملکولی سه سویه متعلق به سروتیپB/793 ویروس های برونشیت عفونی طیور جدا شده از مرغداری های صنعتی ایران
Infectious bronchitis (IB) disease is one of the important respiratory diseases of poultry that causes annually large economic losses in poultry industry of Iran. The aim of this study is molecular characterization of S1 gene of Iranian infectious bronchitis viruse (IBV) belonged to 793/B serotype. The whole S1 gene of three local IBV strains in RT-PCR, showed a band above 1600 base pair ...
متن کاملNucleotide sequence analysis of the Second Internal Transcribed Spacer (ITS2) in Hyalomma anatolicum anatolicum in Iran
Ticks are important acarina that infest animals. They are obligatory blood sucker arthropods which economically impact cattle industry by reducing weight gain and production. Moreover, they are important vectors of viral, bacterial, rickettsial and parasitic pathogens infecting humans and animals. In view of the importance of Hyalomma anatolicum anatolicum in pathogen transmission, including Th...
متن کامل